Extracting and Visualizing Quotations from News Wires
نویسندگان
چکیده
We introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the architecture of SAPIENS and how it was applied to process a corpus of French news wires from the AFP news agency.
منابع مشابه
A Lexicon of French Quotation Verbs for Automatic Quotation Extraction
Quotation extraction is an important information extraction task, especially when dealing with news wires. Quotations can be found in various configurations. In this paper, we focus on direct quotations introduced by a parenthetical clause, headed by a “quotation verb”. Our study is based on a large French news wire corpus from the Agence France-Presse. We introduce and motivate an analysis at ...
متن کاملVisualizing Topical Quotations Over Time to Understand News Discourse
We present the PICTOR browser, a visualization designed to facilitate the analysis of quotations about userspecified topics in large collections of news text. PICTOR focuses on quotations because they are a major vehicle of communication in the news genre. It extracts quotes from articles that match a user’s text query, and groups these quotes into “threads” that illustrate the development of s...
متن کاملAutomatically Detecting and Attributing Indirect Quotations
Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation ex...
متن کاملInformation Extraction and Interactive Visualization of Road Accident Related News
This paper describes a strategy of extracting information from raw data and visualizing them in web browser. Raw data are collected from newspaper. These raw data are in English language. By implementing text mining process specific information extracted and this process explained clearly. Derived information is specifically on road accident related news but raw data contains all kind of news. ...
متن کاملContent Collection and Analysis in the Domain of Epidemiology
We describe a system that tracks the spread of epidemics by automatically extracting content from the Web. The system continuously monitors a large set of news sources, extracts information from new articles, and accumulates the extracted facts in a database in real time. The system provides functionality for visualizing results, as well as alerting capability. We present the current state of t...
متن کامل